The MetaCycle package is mainly used for detecting rhythmic signals from large scale time-series data. It incorporates ARSER(ARS), JTK_CYCLE(JTK), and Lomb-Scargle(LS) properly for periodic signal detection, and could also output integrated analysis results if required.
The usual time-series data is evenly sampled once at each time point, and the interval between neighbour time points is integer. Not all data are as simple as this. There are datasets with replicate samples, or with missing values, or un-evenly sampled, or sampled with a non-integer interval. Examples of these types of data are shown in the below table.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| The usual data | CT0 | CT4 | CT8 | CT12 | CT16 | CT20 |
| With missing value | CT0 | NA | CT8 | CT12 | CT16 | CT20 |
| With replicates | CT0 | CT0 | CT8 | CT8 | CT16 | CT16 |
| With un-even interval | CT0 | CT2 | CT8 | CT10 | CT16 | CT20 |
| With non-integer interval | CT0 | CT4.5 | CT9 | CT13.5 | CT18 | CT22.5 |
Of course, some datasets may seem combination of two or more of above types of data.
| Data Type | Point 1 | Point 2 | Point 3 | Point 4 | Point 5 | Point 6 |
|---|---|---|---|---|---|---|
| With replicates and missing value | CT0 | CT0 | CT8 | NA | CT16 | CT16 |
| With un-even interval and replicates | CT0 | CT2 | CT2 | CT10 | CT16 | CT20 |
The meta2d function in MetaCycle is designed to analyze differnt types of time-series datasets, and it could automatically select proper method to analyze different types of input datasets. The implementation strategy used for meta2d is shown in the flow chart.
In addition to selecting proper methods to analyze different kinds of datasets, meta2d could also output integrated results from multiple methods. Detail explaination about integration stragegies is in the vignettes of MetaCycle.
meta2d recalculates the amplitude with following model: \[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]
The baseline and trend level are explained in the below example.
| Column name | Description | Column name | Description |
|---|---|---|---|
| ARS_pvalue | pvalue from ARS | LS_BH.Q | FDR from LS |
| ARS_BH.Q | FDR from ARS | LS_period | period from LS |
| ARS_period | period from ARS | LS_adjphase | adjusted phase from LS |
| ARS_adjphase | adjusted phase from ARS | LS_amplitude | amplitude from LS |
| ARS_amplitude | amplitude from ARS | meta2d_pvalue | integrated pvalue |
| JTK_pvalue | pvalue from JTK | meta2d_BH.Q | FDR based on integrated pvalue |
| JTK_BH.Q | FDR from JTK | meta2d_period | averaged period |
| JTK_period | period from JTK | meta2d_phase | integrated phase |
| JTK_adjphase | adjusted phase from JTK | meta2d_Base | baseline value given by meta2d |
| JTK_amplitude | amplitude from JTK | meta2d_AMP | amplitude given by meta2d |
| LS_pvalue | pvalue from LS | meta2d_rAMP | relative amplitude |
# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")
# change working directory to the desktop
setwd("~/Desktop")
# check required directories under the desktop
dir()
# load shiny package
library("shiny")
# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")
# load shiny package
library("shiny")
# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")
# load dplyr package
library("dplyr")
# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv", stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)
## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)
## [1] 480
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns for drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase, num_range("X", seq(18, 64, by=2), width = 2))
# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)
heatmapFigure
# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")
histFigure
# take a look at column names of 'cirD' dataframe
colnames(cirD)
## [1] "CycID" "JTK_pvalue" "JTK_BH.Q" "JTK_period"
## [5] "JTK_adjphase" "JTK_amplitude" "LS_pvalue" "LS_BH.Q"
## [9] "LS_period" "LS_adjphase" "LS_amplitude" "meta2d_pvalue"
## [13] "meta2d_BH.Q" "meta2d_period" "meta2d_phase" "meta2d_Base"
## [17] "meta2d_AMP" "meta2d_rAMP" "X18" "X20"
## [21] "X22" "X24" "X26" "X28"
## [25] "X30" "X32" "X34" "X36"
## [29] "X38" "X40" "X42" "X44"
## [33] "X46" "X48" "X50" "X52"
## [37] "X54" "X56" "X58" "X60"
## [41] "X62" "X64"
# if it reports error after typing above command, please re-run the code of preparing 'figureD' in the first part of 3.5 section of this demo
# select its 'CycID', "meta2d_BH.Q", 'meta2d_phase' columns for later analysis
phaseD <- select(cirD, CycID, meta2d_BH.Q, meta2d_phase)
# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt", stringsAsFactors = FALSE)
# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)
## CycID meta2d_BH.Q meta2d_phase ENTREZID SYMBOL
## 1 1439401_x_at 0.0033677682 11.74247 26932 Ppp2r5e
## 2 1456766_at 0.0008827375 18.59238 77609 Ccdc151
## 3 1438743_at 0.0060070233 15.34830 13122 Cyp7a1
## 4 1443849_x_at 0.0107281617 16.22043 22275 Urod
## 5 1424118_a_at 0.0012919968 13.08001 66442 Spc25
## 6 1416214_at 0.0055280767 21.68141 17217 Mcm4
# filter out those probesets without annotation information
phaseD <- filter(phaseD, SYMBOL != "NA")
# select 'SYMBOL', 'meta2d_BH.Q' and 'meta2d_phase' columns for getting a dataframe without duplicate gene names
ori_phaseD <- select(phaseD, SYMBOL, meta2d_BH.Q, meta2d_phase)
# take a look at the row number with possilbe duplicate gene names
dim(ori_phaseD)
## [1] 479 3
# load the 'uniF' function from 'fig.R' for doing this work
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")
# get a dataframe without duplicate gene names
uni_phaseD <- uniF(ori_phaseD)
# take a look at the rownumber now
dim(uni_phaseD)
## [1] 453 3
# select 'SYMBOL' and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(uni_phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)
## SYMBOL meta2d_phase
## 1 1190002N15Rik 15.770103
## 2 1700001C19Rik 6.803986
## 3 1700023H06Rik 6.099139
## 4 2700094K13Rik 9.992612
## 5 4930507D05Rik 15.712746
## 6 4932438A13Rik 22.682055
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", sep = "\t", quote = FALSE, row.names = FALSE)
# read in the 'experimentPSEA.txt' file
ori_pseaD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", stringsAsFactors = FALSE)
# take a look at the data
head(ori_pseaD)
## SYMBOL meta2d_phase
## 1 1190002N15Rik 15.770103
## 2 1700001C19Rik 6.803986
## 3 1700023H06Rik 6.099139
## 4 2700094K13Rik 9.992612
## 5 4930507D05Rik 15.712746
## 6 4932438A13Rik 22.682055
# read in the 'MouseHumanHomolog.txt' file in 'data-raw' directory of SRBR_SMTSAworkshop-master
homoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/MouseHumanHomolog.txt", stringsAsFactors = FALSE)
# take a look at the data
head(homoD)
## Mouse_gene Human_gene
## 1 Acadm ACADM
## 2 Acadvl ACADVL
## 3 Acat1 ACAT1
## 4 Acvr1 ACVR1
## 5 Sgca SGCA
## 6 Adsl ADSL
# join mouse gene and human homolog gene with 'inner_join' function
homo_pseaD <- inner_join(ori_pseaD, homoD, by=c("SYMBOL" = "Mouse_gene"))
# take a look at the joined dataframe
head(homo_pseaD)
## SYMBOL meta2d_phase Human_gene
## 1 1190002N15Rik 15.770103 C3orf58
## 2 1700001C19Rik 6.803986 LOC102723903
## 3 2700094K13Rik 9.992612 C11orf31
## 4 4932438A13Rik 22.682055 KIAA1109
## 5 A230050P20Rik 1.119342 C19orf66
## 6 Abat 4.197369 ABAT
# select "Human_gene" and "meta2d_phase" columns for PSEA analysis
outD <- select(homo_pseaD, Human_gene, meta2d_phase)
# write it to a file named-'human_experimentPSEA.txt' in 'result' directory of SRBR_SMTSAworkshop-master
write.table(outD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/human_experimentPSEA.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
## C3orf58 15.7701032313443
## LOC102723903 6.80398554001375
## C11orf31 9.99261173937095
## KIAA1109 22.6820554336583
## C19orf66 1.11934172507805